Mining hyperintervals Getting to grips with real-valued data
نویسنده
چکیده
Many uses of data mining, such as clustering, classification, the construction of decision trees, subgroup discovery and itemset mining, often fail to be able to cope with real-valued data well. In fact, it is common for data mining methods to only work well on nominal data with little different values. We build the theory to fill this gap for data from arbitrary uncountable sets and introduce an efficient method to mine data, without the usual discretization as a pre-processing step. It is shown that discretization is not needed in order to make use of the MDL principle.
منابع مشابه
RealKrimp - Finding Hyperintervals that Compress with MDL for Real-Valued Data
The MDL Principle (induction by compression) is applied with meticulous effort in the Krimp algorithm for the problem of itemset mining, where one seeks exceptionally frequent patterns in a binary dataset. As is the case with many algorithms in data mining, Krimp is not designed to cope with real-valued data, and it is not able to handle such data natively. Inspired by Krimp’s success at using ...
متن کامل(T) FUZZY INTEGRAL OF MULTI-DIMENSIONAL FUNCTION WITH RESPECT TO MULTI-VALUED MEASURE
Introducing more types of integrals will provide more choices todeal with various types of objectives and components in real problems. Firstly,in this paper, a (T) fuzzy integral, in which the integrand, the measure andthe integration result are all multi-valued, is presented with the introductionof T-norm and T-conorm. Then, some classical results of the integral areobtained based on the prope...
متن کاملA New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملA Study of Improving the Performance of Mining Multi-Valued and Multi-Labeled Data
Nowadays data mining algorithms are successfully applying to analyze the real data in our life to provide useful suggestion. Since some available real data is multi-valued and multi-labeled, researchers have focused their attention on developing approaches to mine multi-valued and multilabeled data in recent years. Unfortunately, there are no algorithms can discretize multi-valued and multi-lab...
متن کاملPattern Discovery for Locating Motifs in Multivariate, Real-valued Time-series Data
The problem of locating motifs in multivariate, real-valued time series data concerns the discovery of sets of recurring patterns embedded in the time series. Each set is composed of several nonoverlapping subsequences and constitutes a motif because all of the subsequences are similar. This task is a natural extension of univariate motif discovery in both the symbolic and real-valued domains a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012